Motif Extraction using an Improved Iterative Duplication Method for HMM Topology Learning
نویسنده
چکیده
In this paper, Hidden Markov Models (HMMs) are utilized to represent motifs. Motifs are the preserved sites in the evolution process and are considered to represent the function or structure of proteins. It is compulsory to use direct periodic topology and the OR rule to represent motifs. The direct periodic topology is necessary to represent cyclic structures, such as global loops for helices. The OR rule is necessary to represent di erent families or types in motifs. Therefore, to represent motifs, HMMs must contain direct periodic topology and OR rule. In order to obtain an optimal HMM topology for a motif, we improved the \iterative duplication method"[1]. In this method, a small fully-connected HMM is gradually expanded by a state splitting. The initial number of the states is set to three in the current implementation. It iterates through a parameter estimation phase followed by a topology modi cation phase. It terminates when the accuracy of model size selection data is maximum. For parameter estimation, the Baum-Welch algorithm is used on several initial HMMs with randomized parameters to avoid converging in the local maximum. Then, the best HMM is chosen based on the likelihood of training data. In the topology modi cation phase, negligible transition deletion, negligible state deletion, splitting state selection and state splitting is performed. The negligible transition deletion is performed by deleting the transitions with negligible transitional probability, the threshold is = max( 1 ; r), where 1 is a smoothing value and r is a convergence radius. The purpose of deleting the transitions is to keep the topology space small and consequently to reduce the computation for the parameter estimation. The negligible state deletion is performed by deleting states with negligible initial probabilities and negligible incoming transitions. This state deletion also reduces the training computation and has a good e ect for calculating the criteria of splitting state. The
منابع مشابه
Parallel Characteristic Extraction from Protein Sequence Database
An adaptive massively parallel system for flexible information processing has been investigated. This research requires a feedback from the real application. In this paper, a parallel characteristic extraction from the protein sequence database is described. Since the protein sequence database is huge and sequences have variety, an adaptive massively parallel system is mandatory. An HMM (hidden...
متن کاملStochastic Motif Extraction Using Hidden Markov Model
In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM...
متن کاملAlert correlation and prediction using data mining and HMM
Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...
متن کاملPrediction of Mitochondrial Targeting Signals Using Hidden Markov Model.
The mitochondrial targeting signal (MTS) is the presequence that directs nascent proteins bearing it to mitochondria. We have developed a hidden Markov model (HMM) that represents various known sequence characteristics of MTSs, such as the length variation, amino acid composition, amphiphilicity, and consensus pattern around the cleavage site. The topology and parameters of this model are autom...
متن کاملIterative learning identification and control for dynamic systems described by NARMAX model
A new iterative learning controller is proposed for a general unknown discrete time-varying nonlinear non-affine system represented by NARMAX (Nonlinear Autoregressive Moving Average with eXogenous inputs) model. The proposed controller is composed of an iterative learning neural identifier and an iterative learning controller. Iterative learning control and iterative learning identification ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995